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Abstract 

The need to generate new views of a 3D object from a single real image arises in several fields, including 
graphics and object recognition. While the traditional approach relies on the use of 3D models, we have 
recently introduced [11, 6, 5] techniques that are applicable under restricted conditions but simpler. The 
approach exploits image transformations that are specific to the relevant object class and learnable from 
example views of other "prototypical" objects of the same class. 

In this paper, we introduce such a new technique by extending the notion of linear class first proposed 
by Poggio and Vetter [12]. For linear object classes it is shown that linear transformations can be learned 
exactly from a basis set of 2D prototypical views. We demonstrate the approach on artificial objects and 
then show preliminary evidence that the technique can effectively "rotate" high-resolution face images 
from a single 2D view. 
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1 Introduction 

View-based approaches to 3D object recognition and 
graphics may avoid the explicit use of 3D models by 
exploiting the memory of several views of the object and 
the ability to interpolate or generalize among them. In 
many situations however a sufficient number of views 
may not be available. In an extreme case we may have 
to do with only one real view. Consider for instance the 
problem of recognizing a specific human face under a dif- 
ferent pose or expression when only one example picture 
is given. Our visual system is certainly able to perform 
this task - even if at performance levels that are likely to 
be lower than expected from our introspection [10, 15]. 
The obvious explanation is that we exploit prior informa- 
tion about how face images transform, learned through 
extensive experience with other faces. Thus the key idea 
(see [12]), is to learn class-specific image-plane transfor- 
mations from examples of objects of the same class and 
then to apply them to the real image of the new object in 
order to synthesize virtual views that can be used as ad- 
ditional examples in a view-based object recognition or 
graphic system. Prior knowledge about a class of objects 
may be known in terms of invariance properties. Poggio 
and Vetter [12] examined in particular the case of bilat- 
eral symmetry of certain 3D objects, such as faces. Prior 
information about bilateral symmetry allows the synthe- 
sis of new virtual views from a single real one, thereby 
simplifying the task of generalization in recognition of 
the new object under different poses. Bilateral symme- 
try has been used in face recognition systems [5] and 
psychophysical evidence supports its use by the human 
visual system [15, 13, 18]. 

A more flexible way to acquire information about how 
images of objects of a certain class change under pose, 
illumination and other transformations, is to learn the 
possible pattern of variabilities and class-specific defor- 
mations from a representative training set of views of 
generic or prototypical objects of the same class - such as 
other faces. Although our approach originates from the 
proposal of Poggio and Brunelli [11] and of Poggio and 
Vetter [12], for countering the curse-of-dimensionality in 
applications of supervised learning techniques, similar 
approaches with different motivations have been used 
in several different fields. In computer graphics, actor- 
based animation has been used to generate sequences of 
views of a character by warping an available sequence 
of a similar character. In computer vision the approach 
closest to the first part of ours is the active shape models 
of Cootes, Taylor, Cooper and Graham [14]. They build 
flexible models of known rigid objects by linear combi- 
nation of labeled examples for the task of image search 
- recognition and localization. In all of these approaches 
the underlying representation of images of the new object 
are in terms of linear combinations of the shape of exam- 
ples of representative other objects. Beymer, Shashua 
and Poggio [6] as well as Beymer and Poggio [5] have 
developed and demonstrated a more powerful version 
of this approach based on non-linear learning networks 
for generating new grey-level images of the same object 
or of objects of a known class. Beymer and Poggio [5] 
also demonstrated that new textures of an object can be 



generated by linear combinations of textures of differ- 
ent objects. In this paper, we extend and introduce the 
technique of linear classes to generate new views of an 
object. The technique is similar to the approach of [5, 6] 
but more powerful since it relies less on correspondence 
between prototypical examples and the new image. 

The work described in this paper is based on the idea 
of linear object classes. These are 3D objects whose 3D 
shape can be represented as a linear combination of a 
sufficiently small number of prototypical objects. Linear 
object classes have the properties that new orthographic 
views of any object of the class under uniform affine 3D 
transformations, and in particular rigid transformations 
in 3D, can be generated exactly if the corresponding 
transformed views are known for the set of prototypes. 
Thus if the training set consist of frontal and rotated 
views of a set of prototype faces, any rotated view of a 
new face can be generated from a single frontal view - 
provided that the linear class assumption holds. In this 
paper, we show that the technique, first introduced for 
shape-only objects can be extended to their grey-level or 
colour values as well, which we call texture. 

Key to our approach is a representation of an object 
view in terms of a shape vector and a texture vector (see 
also Jones and Poggio [9] and Beymer and Poggio [5]). 
The first gives the image-plane coordinates of feature 
points of the object surface; the second provides their 
colour or grey-level. On the image plane the shape vec- 
tor reflects geometric transformation in the image due to 
a change in view point, whereas the texture vector cap- 
tures photometric effects, often also due to viewpoint 
changes. 

For linear object classes the new image of an object 
of the class is analyzed in terms of shape and texture 
vectors of prototype objects in the same pose. This re- 
quires correspondence to be established between all fea- 
ture points of the prototype images - both frontal and 
rotated - which can be done in a off-line stage and does 
not need to be automatic. It also require correspondence 
between the new image and one of the prototypes in the 
same pose but does not need correspondence between 
different poses as in the parallel deformation technique 
of Poggio and Brunelli [11] and Beymer et al.[6]. 

The paper is organized as follows. The next section 
formally introduces linear object classes, first for objects 
defined only through their shape vector. Later in the 
section we extend the technique to objects with textures 
and characterize the surface reflectance models for which 
our linear class approach is valid. Section 3 describes an 
implementation of the technique for synthetic objects 
for which the linear class assumption is satisfied by con- 
struction. In the last section we address the key question 
of whether the assumption is a sufficiently good approx- 
imation for real objects. We consider images of faces 
and demonstrate promising results that indirectly sup- 
port the conjecture that faces are a linear class at least to 
a first approximation. The discussion reviews the main 
features of the technique and its future extensions. 



2 Linear Object Classes 

Three-dimensional objects differ in shape as well as in 
texture. In the following we will derive an object repre- 
sentation consisting of a separate texture vector and a 
2D-shape vector, each one with components referring to 
the same feature points, usually pixels. Assuming cor- 
respondence, we will represent an image as follows: we 
code its 2D-shape as the deformation field of selected 
feature points - in the limit pixels - from a reference im- 
age which serves as the origin of our coordinate system. 
The texture is coded as the intensity map of the image 
with feature points e.g. pixels set in correspondence with 
the reference image. Thus each component of the shape 
and the feature vector refers to the same feature point 
e.g. pixel. In this setting 2D-shape and texture can be 
treated separately. We will derive the necessary and suf- 
ficient conditions for a set of objects to be a linear object 
class. 

2.1 Shape of 3D objects 

Consider a 3D view of an three-dimensional ob- 
ject, which is defined in terms of pointwise features 
[12]. A 3D view can be represented by a vector 

X = (x 1 ,y 1 ,z 1 ,x 2 , ,y n ,z n ) T , that is by the x,y,z- 

coordinates of its n feature points. Assume that X £ 3? 3n 
is the linear combination of q 3D views X; of other ob- 
jects of the same dimensionality, such that: 



INPUT 

EXAMPLES 





TEST 




OUTPUT 



X 



/2 a *' Xi 



(i) 



Figure 1: Learning an image transformation according 
to a rotation of three-dimensional cuboids from one ori- 
entation (upper row) to a new orientation (lower row). 
The 'test' cuboid (upper row right) can be represented as 
a linear combination of the two-dimensional coordinates 
of the three example cuboids in the upper row. The lin- 
ear combination of the three example views in the lower 
row, using the coefficients evaluated in the upper row, 
results in the correct transformed view of the test cuboid 
as output (lower row right). Notice that correspondence 
between views in the two different orientations is not 
needed and different points of the object may be occluded 
in the different orientations. 



X is then the linear combination of q vectors in a 3n 
dimensional space, each vector representing an object of 
n pointwise features. Consider now the linear operator L 
associated with a desired uniform transformation such as 
for instance a specific rotation in 3D. Let us define X r = 
_LX the rotated 3D view of object X. Because of the 
linearity of the group of uniform linear transformations 
C, it follows that 



X r 



X! aiX - r i 



(2) 



Thus, if a 3D view of an object can be represented as the 
weighted sum of views of other objects, its rotated view 
is a linear combination of the rotated views of the other 
objects with the same weights. Of course for an arbitrary 
2D view that is a projection of a 3D view, a decomposi- 
tion like (1) does not in general imply a decomposition 
of the rotated 2D views (it is a necessary but not a suf- 
ficient condition). 



dim{PX 8 } with X; £ 3? 3n and PX, £ W and p < 3n 

This is equivalent to saying that the minimal number 
of basis objects necessary to represent a object is not 
allowed to change under the projection. Note that the 
linear projection P is not restricted to projections from 
3D to 2D, but may also "drop" occluded points. Now 
assume x = _PX and x; = PX,- being the projections of 
elements of an linear object class with 



E< 



(3) 



then x r = _PX r can be constructed without knowing 
X r using a j of equation (3) and the given x^ = PXJ of 
the other objects. 
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(4) 



2D projections of 3D objects 

The question we want to answer here is, "Under which 
conditions the 2D projections of 3D objects satisfy equa- 
tion (1) to (2)?" The answer will clearly depend on the 
types of objects we use and also on the projections we 
allow. We define: 

A set of 3D views (of objects) {X;} is a linear ob- 
ject class under a linear projection P if dim{X.i} = 



These relations suggest that we can use "prototypical" 
2D views (the projections of a basis of a linear object 
class) and their known transformations to synthesize an 
operator that will transform a 2D view into a new 2D 
view when the object is a linear combination of the pro- 
totypes. In other words we can compute a new 2D view 
of such an object without knowing explicitly its three- 
dimensional structure. Notice also, that knowledge of 



the correspondence between equation (3) and equation 
(4) is not necessary (rows in a linear equation system can 
be exchanged freely). Therefore, the technique does not 
require to compute the correspondence between views 
from different viewpoints. In fact some points may be 
occluded. Figure 1 shows a very simple example of a 
linear object class and the construction of a new view 
of an object. Taking the 8 corners of a cuboid as fea- 
tures, a 3D view X, as defined above, is an element of 
3? 24 ; however, the dimension of the class of all cuboids 
is only 3, so any cuboid can be represented as a linear 
combination of three cuboids. For any projection, that 
preserve these 3 dimensions, we can apply equations (3) 
and (4). The projection in figure 1 projects all non oc- 
cluded corners orthographically onto the image-plane ( 
x = _PX G 3? 14 ) preserving the dimensionality. Notice, 
that the orthographic projection of an exactly frontal 
view of a cuboid, which would result in a rectangle as 
image, would preserve 2 dimensions only, so equation (4) 
could not guarantee the correct result. 
Before applying this idea to grey-level images, we would 
like to introduce a helpful change of coordinate systems 
in equations (3) and (4). Instead of using an absolute 
coordinate system, we represent the views as the differ- 
ence to the view of a reference object of the same class, 
in terms of the spatial differences of corresponding fea- 
ture points in the images. Subtracting on both sides of 
equations (3) and (4) the projection of a reference object 



gives us 
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tt.;Ax,; 



Ax r =^«,-A<. 



(5) 



(6) 



After this change in the coordinate system, equation 
(6) now evaluates to the new difference vector to the ro- 
tated reference view. The new view of the object can 
be constructed by adding this difference to the reference 



2.2 Texture of 3D objects 

In this section we extend our linear space model from 
a representation based on feature points to full images 
of objects. In the following we assume that the objects 
are isolated, that is properly segmented from the back- 
ground. To apply equations (5) and (6) to images, the 
difference vectors between an image of a reference object 
and the images of the other objects have to be computed. 
Since the difference vectors reflect the spatial difference 
of corresponding pixels in images, this correspondence 
has to be computed first. The problem of finding corre- 
spondence between images in general is difficult and out- 
side the scope of this paper. In the following we assume 
that the correspondence is given for every pixel in the 
image. In our implementation (see next section) we ap- 
proximated this correspondence fields using a standard 
optical flow technique. For an image of n-by-n pixels Ax 



in equations (5) and (6) are the correspondence fields of 
the images to a reference image with Ax G s R 2n ■ 

The computed correspondence between images en- 
ables a representation of the image that separates 2D- 
shape and texture information. The 2D-shape of an im- 
age is coded as a vector representing the deformation 
field relative to a reference image. The texture informa- 
tion is coded in terms of a vector which holds for each 
pixel the texture map that results from mapping the im- 
age onto the reference image through the deformation 
field. In this representation, all images - the shape vec- 
tor and the texture vector - are vectorized relative to the 
reference image. Since the texture or image irradiance 
of an object is in general a complex function of albedo, 
surface orientation and the direction of illumination, we 
have to distinguish different situations. 

Let us first consider the easy case of objects all with 
the same identical texture: corresponding pixels in each 
image have the same intensity or color. In this situation 
a single texture map (e.g. the reference image) is suffi- 
cient. Assuming a linear object class as described ear- 
lier, the shape coefficients a.{ can be computed (equation 
5) and result (equation 6) in the correspondence field 
from the reference image in the second orientation to the 
new 'virtual' image. To render the 'virtual' image, the 
reference image has to be warped along this correspon- 
dence field. In other words the reference image must be 
mapped onto the image locations given through the cor- 
respondence field. In Figure 2 the method is applied to 
grey level images of three-dimensional computer graphic 
models of five dog-like objects. The 'dogs' are shown in 
two orientations and four examples of this transforma- 
tion from one orientation to the other are given. Only a 
single test view of a different dog is given. In each orien- 
tation, the correspondence from a chosen reference image 
(dashed box) to the other images is computed separately 
(see also section 'An implementation'). Since the dogs 
were created in such a way that the three-dimensional 
objects form a linear object class, the correspondence 
field to the test image could be decomposed exactly into 
the other fields (upper row). Applying the coefficients 
of this decomposition to the correspondence fields of the 
second orientation results in the correspondence of the 
reference image to a new image, showing the test object 
in the second orientation. This new image ("output" in 
the lower row) was created by simply warping the ref- 
erence image along this correspondence field, since all 
objects had the same texture. Since in this test a three- 
dimensional model of the object was available, the syn- 
thesized output could be compared to the model. As 
shown in the difference image, there is only a small er- 
ror, which can be attributed to minor errors in the cor- 
respondence step. This example shows that the method 
combined with standard image matching algorithms is 
able to transform an image in a way that shows an ob- 
ject from a new viewpoint. 

Let us next consider the situation in which the texture 
is a function of albedo only, that is independent of the 
surface normal. Then a linear texture class can be for- 
mulated in a way equivalent to equations (1) through 
(4). This is possible since the textures of all objects were 
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Figure 2: Grey level images of an artificial linear object class are rendered. The correspondence between the images 
of a reference object (dashed box) and the other examples are computed separately for each orientation. The corre- 
spondence field between the test image and the reference image is computed and linearly decomposed into the other 
fields (upper row). A new correspondence field is synthesized applying the coefficients from this decomposition to the 
fields from the reference image to the examples in the lower row. The output is generated by forward warping the 
reference image along this new correspondence field. In the difference image between the new image and the image of 
the true 3D model (lower row, right), the missing parts are marked white whereas the parts not existing in an image 
of the model are in black. 



mapped along the computed deformation fields onto the 
reference image, so all corresponding pixels in the images 
are mapped to the same pixel location in the reference 
image. The equation 



t = J2 At,- 



with fa (different to a; in equation (3)) implies 



I> 



(7) 



(8) 



assuming that the appearance of the texture is indepen- 
dent of the surface orientation and the projection does 
not change the dimensionality of the texture space. Here 
we are in the nice situation of a separate shape and tex- 
ture space. In an application the coefficients a; for the 
shape and coefficients /?; for the texture can be computed 
separately. In face recognition experiments [5] the coef- 
ficients fa were already used to generate a new texture 
of a faces using textures of differnt faces. Figure 3 shows 
a test of this linear approach for a separated 2D-shape 
and texture space in combination with the approximated 
correspondence. Three example faces are shown, each 
from two different viewpoints accordingly to a rotation 
of 22.5°. Since the class of all faces has more than three 
dimensions a synthetic face image is used to test the 



method. This synthetic face is generated by a standard 
morphing technique [1] between the two upper left im- 
ages. This ensures that the necessary requirements for 
the linear class assumption hold, that is the test image 
is a linear combination of the example images in texture 
and 2D-shape. In the first step for each orientation the 
correspondence between a reference face (dashed box) 
and the other faces is computed. Using the same pro- 
cedure described earlier, the correspondence field to the 
test image is decomposed into the other fields evaluating 
the coefficients a;. Differently from figure 2, the textures 
are mapped onto the reference face. Now the texture of 
the test face can be linearly decomposed into the textures 
of the example faces. Applying the resulting coefficients 
Pi to the textures of the example faces in the second 
orientation (lower row of figure 3), we generate a new 
texture mapped onto the reference face. This new tex- 
ture is now warped along the new correspondence field. 
This new field is evaluated applying the coefficients a; 
to the correspondence fields of the examples to the ref- 
erence face in the second orientation. The output of this 
procedure is shown below the test image. Since the in- 
put is synthetic, this result can not be compared to the 
true rotated face, so it is up to the observer to judge the 
quality of the applied transformation of the test image. 
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Figure 3: Three human example faces are shown, each in two orientations (the three left columns), one of these faces 
is used as reference face (dashed box). A synthetic face, a c morph' between the two upper left images, is used as a 
test face to ensure the linear combination constraint (upper right). The procedure of decomposing and synthesizing 
the correspondences fields is as described in figure 2. Additionally all textures, for each orientation separately, are 
mapped onto the reference face. Here the test texture is decomposed into the other example textures. Using the 
evaluated coefficients a new texture is synthesized for the second orientation on the reference face. The final output, 
the transformed test face, is generated by warping this new texture along the new synthesized correspondence field. 



3 An Implementation 

The implementation of this method for grey-level pixel 
images can be divided into three steps. First, the corre- 
spondence between the images of the objects has to be 
computed. Second, the correspondence field to the new 
image has to be linearly decomposed into the correspon- 
dence fields of the examples. The same decomposition 
has to be carried out for the new texture in terms of 
the example textures. And finally we synthesize the new 
image, showing the object from the new viewpoint. 

3.1 Computation of the Correspondence 

To compute the differences Ax used in equations (5) and 
(6), which are the spatial distances between correspond- 
ing points of the objects in the images, the correspon- 
dence of this points has to be established first. That 
means we have to find for every pixel location in an im- 
age, e.g. a pixel located on the nose, the corresponding 
pixel location on the nose in the other image. This is 
in general a hard problem. However, since all objects 
compared here are in the same orientation, we can of- 
ten assume that the images are quite similar and that 
occlusion problems should usually be negligible. These 



conditions make it feasible to compare the images of the 
different objects with automatic techniques. Such al- 
gorithms are known from optical flow computation, in 
which points have to be tracked from one image to the 
other. We use a coarse-to-fine gradient-based gradient 
method [2] and follow an implementation described in 
[3]. For every point x,y in an image I, the error term 
E = ^2(I x 6x + I y 6y — 61) is minimized for bx, by, with 
I x , I y being the spatial image derivatives and SI the dif- 
ference of intensity of the two compared images. The 
coarse-to-fine strategy refines the computed displace- 
ments when finer levels are processed. The final result of 
this computation (bx, by) is used as an approximation of 
the spatial displacement (Ax in equation (5)and (6))of a 
pixel from one image to the other. The correspondence 
is computed in the direction towards the reference image 
from the example and the test images. As a consequence 
all vector fields have a common origin at the pixel loca- 
tions of the reference image. 

3.2 Learning the Linear Transformation 

The decomposition of a given correspondence field in 
equation (5) and the composition of the new field in 



equation (6) can be understood as a single linear trans- 
formation. First, we compute the coefficients a.{ for the 
optimal decomposition (in the sense of least square). We 
decompose a "initial" field Ax to a new object X into the 
"initial" fields Ax,' to the q given prototypes by minimiz- 
ing 



|Ax- 



8 = 1 



a,; Ax,; I 



(9) 



We rewrite equation (5) as Ax = $« where $ is the 
matrix formed by the q vectors Ax,' arranged column- 
wise and a is the column vector of the a.{ coefficients. 
Minimizing equation (9) gives 

« = ($) + Ax. (10) 

The observation of the previous section implies that the 
operator L that transforms Ax into Ax r through Ax r = 
_LAx, is given by 



Ax r = $ r a = $ r $+Ax 



£ = $ r $+ (11) 



and thus can be learned from the 2D example pairs 
(Ax;,Ax£). In this case, a one-layer, linear network 
(compare Hurlbert and Poggio, 1988) can be used to 
learn the transformation L. L can then transform a view 
of a novel object of the same class. If the q examples are 
linearly independent $ + is given by $ + = ($ T $) $ T ; 
in the other cases equation (9) was solved by an SVD al- 
gorithm. 

Before decomposing the new texture into the example 
textures, all textures have to be mapped onto a common 
basis. Using the correspondence, we warped all images 
onto the reference image. In this representation the de- 
composition of the texture can be performed as described 
above for the correspondence fields. 

3.3 Synthesis of the New Image. 

The final step is image rendering. Applying the com- 
puted coefficients to the examples in the second orien- 
tation results in a new texture and the correspondence 
fields to the new image. The new image can be generated 
combining this texture and correspondence field. This is 
possible because both are given in the coordinates of the 
reference image. That means that for every pixel in the 
reference image the pixel value and the vector pointing 
to the new location are given. The new location gen- 
erally does not coincide with the equally spaced grid of 
pixels of the destination image. A commonly used solu- 
tion of this problem is known as forward warping [19]. 
For every new pixel, we use the nearest three points to 
linearly approximate the pixel intensity. 

4 Is the linear class assumption valid 
for real objects? 

For man made objects, which often consist of cuboids, 
cylinders or other geometric primitives, the assumption 
of linear object classes seems almost natural. However, 
are there other object classes which can be linearly rep- 
resented by a finite set of example objects? In the case 



of faces it is not clear how many example faces are neces- 
sary to synthesize any other face and in fact, it is unclear 
if the assumption of a linear class is appropriate at all. 
The key test for the linear class hypothesis in this case is 
how well the synthesized rotated face approximates the 
"true" rotated face. We tested our approach on a small 
set of 50 faces, each given in two orientations (22.5° and 
0°). Figure 4 shows four tests using the same technique 
as described in figure 3. In each case one face was se- 
lected as test face and the 49 remaining faces were used 
as examples. Each test face is shown on the upper left 
and the output image produced by our technique on the 
lower right, showing a rotated test face. The true ro- 
tated test face from the data base is shown on the lower 
left. We also show in the upper right the synthesis of 
the test face through the 49 example faces in the test 
orientation. This reconstruction of the test face should 
be understood as the projection of the test face into the 
shape and texture space of the other 49 example faces. 
A perfect reconstruction of the test face would be a nec- 
essary (not sufficient!) requirement that the 50 faces 
are a linear object class. The results are not perfect 
but, considering the small size of the example set, the 
reconstruction is quite good. The similarity of the re- 
construction to the input test face allows to speculate 
that an example set size of the order of hundred faces 
may be sufficient to construct a huge variety of different 
faces. We conclude that the linear object class approach 
may be a satisfactory approximation even for complex 
objects as faces. On the other hand it is obvious that 
the reconstruction of every specific mole or wrinkle in a 
face requires to an almost infinite number of examples. 
To overcome this problem, correspondence between im- 
ages taken from different viewpoints should be used to 
map the specific texture on the new orientation [9, 5]. 

5 Discussion 

Linear combinations of images of a single object have 
been already successfully used to create a new image of 
that object [16]. Here we created a new image of an 
object using linear combinations of images of different 
objects of the same class. Given only a single image of 
an object, we are able to generate additional synthetic 
images of this object under the assumption that the "lin- 
ear class" property holds. This is demonstrated not only 
for objects purely defined through their shape but also 
for smooth objects with texture. 

This approach based on two-dimensional models does 
not need any depth information, so the sometime diffi- 
cult step of generating three-dimensional models from 
two-dimensional images is superfluous. Since no cor- 
respondence is necessary between images, representing 
objects in different orientations, fully automated algo- 
rithms can be applied for the correspondence finding 
step. For object recognition tasks our approach has sev- 
eral implications. Our technique can provide additional 
artificial example images of an object when only a sin- 
gle image is given. On the other hand the coefficients, 
which result from a decomposition of shape and texture 
into example shapes and textures give us already a rep- 
resentation of the object which is invariant under any 
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Figure 4: Four examples of artificially rotated human faces, using the technique described in figure 3 are shown. 
Each test face (upper left) is "rotated" using J t 9 different faces (not shown) as examples, the results are marked as 
output. Only for comparison the "true" rotated test face is shown on the lower left (this face was not used in the 
computation). The difference, between synthetic and real rotated face is due to the incomplete example set, since the 
same difference can already be seen in the reconstruction of the input test face using the J t 9 example faces (upper 
right). 
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affine transformation. 

In an application our approach is confronted with two 
types of problems. As in any approach based on flexible 
models, there is the problem of finding the correspon- 
dence between model and image. In our implementa- 
tion we used a general method for finding this corre- 
spondence. However, if the class of objects is known in 
advance, a method specific to this object class could be 
used [9, 7]. In this case the correspondence field is lin- 
early modeled by a known set of deformations specific to 
that class of objects. 

A second problem, specific to our approach is the ex- 
istence of linear object classes and the completeness of 
the available examples. This is equivalent to the ques- 
tions of whether object classes defined in terms of human 
perception can be modeled through linear object classes. 
Presently there is no final answer to this question, apart 
for simple objects like (e.g. cuboids, cylinders), where 
the dimensionality is given through their mathematical 
definition. The application of the method to a small 
example set of human faces, shown here, provides pre- 
liminary promising results at least for some faces. It is, 
however, clear that 50 example faces are not sufficient 
to model accurately all human faces. Since our linear 
model allows to test the necessary conditions for an im- 
age being a member of a linear object class, the model 
can detect images where a transformation fails. This test 
can be done by measuring the difference between the in- 
put image and its projection into the example space, 
which should ideally vanish. 

Our implementation, as described in our examples, can 
be improved by applying the linear class idea to inde- 
pendent parts of the objects. In the face case, a new 
input face was linearly approximated through the com- 
plete example faces, that is for each example face a sin- 
gle coefficient (for texture and 2D-shape separately) was 
computed. Assume noses, mouths or eyes span sepa- 
rated linear subspaces, then the dimensionality of the 
space spanned by the examples will be multiplied by the 
number of subspaces. So in a new image the different 
parts will be approximated separately by the examples, 
that will increase the number of coefficients used as rep- 
resentation and will also improve the reconstruction. 
Several open questions remain for a fully automated im- 
plementation. The separation of parts of an object to 
form separated subspaces could be done by computing 
the covariance between the pixels of the example images. 
However, for images at high resolution, this may need 
thousands of example images. Our linear object class 
approach also assumes that the orientation of an object 
in an image is known. The orientation of faces can be 
approximated computing the correlation of a new image 
to templates of faces in various orientations [4]. It is not 
clear how precisely the orientation should be estimated 
to yield satisfactory results. 

Appendix 

A Decomposing objects into parts 

In the previous section we considered learning the ap- 
propriate transformation from full views. In this case, 



the examples (prototypes) must have the same dimen- 
sionality as a full view. Our arguments above show that 
dimensionality determines the number of example pairs 
needed for a correct transformation. This section sug- 
gests that components of an object - i.e. a subset of 
the full set of features - that are element of the same 
object class may be used to learn a single transforma- 
tion with a reduced number of examples, because of the 
smaller dimensionality of each component. We rewrite 
equation (1) to X = $« where $ is the matrix formed 
by the q vectors X{ arranged column-wise and a is the 
column vector of the a.{ coefficients. The basic compo- 
nents in which a view can be decomposed are given by 
the irreducible submatrices $W of the structure matrix 
$ so that $ = $W © .... © $(&). Each submatrix $W 
represents an isolated object class, formed by a subset 
of feature points which we would like to call a part of 
an object. As an example, for objects composed by two 
cuboids in general six examples would be necessary since 
all 3D views of objects composed of two cuboids span a 
six-dimensional space (we suppose a fixed angle between 
the cuboids). However, this space $ is the direct sum 
$ = (fry 1 ) © $( 2 ) of two three-dimensional subspaces, so 
three examples are sufficient. Notice the & ) and $( 2 ) 
are only identical when both are in the same orienta- 
tion. This shows that the problem of transforming the 
2D view x of the 3D objects X into the transformed 2D 
views x r , can be treated separately for each component 
x( k ). 
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